- blog/
Handling Graceful Shutdowns in AWS Lambda
Introduction #
Serverless computing, or Function as a Service (FaaS), revolutionizes development by abstracting away the complexities of infrastructure management. You write your code, deploy it, and it seamlessly executes on demand, optimizing costs by charging only for its execution time. Yet, what if you need to perform crucial cleanup tasks before function shutdown? Consider scenarios like sending essential telemetry data to your monitoring system. In this blog post, I’ll guide you through the intricate process of executing these essential pre-shutdown operations.
The Problem #
When you deploy your function to AWS Lambda, you can configure the memory size and the timeout. The timeout is the maximum time your function can run. Your function will execute the code and shutdown, or if the timeout limit is reached. Since you don’t control this shutdown face, you don’t know whether tasks like flushing telemetry data will be completed. So you need a way to handle this. An example of this could be adding a delay at the end of your function. This will give your function some extra time to do some cleanup. But this is not a good solution. Because you don’t know how long your function will run. So you don’t know how long the delay should be. And you are paying for the extra time your function is running.
AWS Lambda has support for a shutdown hook when using an extension. That hook is called when your function is getting terminated. So the extension has some time to do some cleanup.
When a function is about to stop, the underlying runtime will send a SIGTERM signal to the function. The extension can listen to this signal and do the extra work that is needed.
A Lambda lifecycle consists of three phases:
- Init: In this phase, Lambda creates or unfreezes an execution environment with the configured resources, downloads the code for the function and all layers, initializes any extensions, initializes the runtime, and then runs the function’s initialization code (the code outside the main handler). The Init phase happens either during the first invocation, or in advance of function invocations if you have enabled provisioned concurrency. The Init phase is split into three sub-phases: Extension init, Runtime init, and Function init. These sub-phases ensure that all extensions and the runtime complete their setup tasks before the function code runs.
- Invoke: In this phase, Lambda invokes the function handler. After the function runs to completion, Lambda prepares to handle another function invocation.
- Shutdown: This phase is triggered if the Lambda function does not receive any invocations for a period of time. In the Shutdown phase, Lambda shuts down the runtime, alerts the extensions to let them stop cleanly, and then removes the environment. Lambda sends a Shutdown event to each extension, which tells the extension that the environment is about to be shut down.
We are only interested in the Shutdown phase. So let’s take a look at the Shutdown phase.
Duration limit: The maximum duration of the Shutdown phase depends on the configuration of registered extensions:
- 0 ms – A function with no registered extensions
- 500 ms – A function with a registered internal extension
- 2,000 ms – A function with one or more registered external extensions
So if you only use the OpenTelemetry Extension, that means you have 500ms of extra time to do some cleanup. If you use more than one extension, you have 2000ms of extra time.
Now that we understand how a Lambda function is terminated and how much time we have to do some cleanup, how can we do this?
The Old Way #
In the old days of .NET, there was this C# event called AppDomain.ProcessExit. This event is raised when the CLR is about to unload an application domain. So you could use this event to do some cleanup. But this event is not raised when your function is terminated. So we need something else.
The New Way #
.NET 6 introduced a new feature which handles POSIX signals. POSIX signals are low-level asynchronous notifications that are sent to a process to notify it of a particular event. For example, when a process is about to be terminated.
The main thing to use is PosixSignalRegistration
(link). This class allows you to register a callback for a POSIX signal.
There is a number of signals that can be handled by the .NET runtime which you can find here.
For now, we are only interested in the SIGTERM
signal.
Registration of the SIGTERM
signal is done by calling the following method:
public void FunctionHandler(SNSEvent snsEvent, ILambdaContext context)
{
PosixSignalRegistration.Register(PosixSignal.SIGTERM, HandleSigterm);
using var activity = DiagnosticSource.StartActivity("ProcessOrder");
activity?.SetTag("order_id", orderId);
// process the order
}
The HandleSigterm
method is called when the SIGTERM
signal is received. In this case, we want to stop the current activity and flush the telemetry data.
Since we can’t use the activity variable in the OnSigTerm
method, we have to use the Activity.Current
property to get the current activity.
private static void HandleSigterm(PosixSignal signal)
{
Console.WriteLine($"Received signal: {signal}");
Activity.Current?.Stop();
}
Now we have a way to handle the SIGTERM
signal without using a delay to ensure your functions have enough time to send the telemetry data.
A complete example can be found here.
The .NET runtime handles the following signals:
- SIGTERM: The SIGTERM signal is sent to a process to request its termination. Unlike the SIGKILL signal, it can be caught and interpreted or ignored by the process. This allows the process to perform nice termination releasing resources and saving state if appropriate. SIGINT is nearly identical to SIGTERM.
- SIGINT: The SIGINT signal is sent to a process by its controlling terminal when a user wishes to interrupt the process. This is typically initiated by pressing Ctrl+C, but on some systems, the “delete” character or “break” key can be used.
- SIGQUIT: The SIGQUIT signal is sent to a process by its controlling terminal when the user requests that the process quit and perform a core dump.
The Solution #
In this example, we have a Lambda function receiving an order and we want to trace this.
using System;
using System.Threading.Tasks;
using Amazon.DynamoDBv2;
using Amazon.DynamoDBv2.DocumentModel;
using Amazon.Lambda.Core;
using Amazon.Lambda.SNSEvents;
using Newtonsoft.Json.Linq;
using System.Runtime.InteropServices;
using System.Diagnostics;
[assembly: LambdaSerializer(typeof(Amazon.Lambda.Serialization.Json.JsonSerializer))]
namespace ProcessOrderFunction
{
public class Function
{
private readonly IAmazonDynamoDB _dynamoDbClient;
public Function()
{
_dynamoDbClient = new AmazonDynamoDBClient();
}
public void FunctionHandler(SNSEvent snsEvent, ILambdaContext context)
{
// Register SIGTERM handler
PosixSignalRegistration.Register(PosixSignal.SIGTERM, HandleSigterm);
foreach (var record in snsEvent.Records)
{
var message = record.Sns.Message;
var orderInfo = JObject.Parse(message);
var orderId = orderInfo["order_id"].ToString();
var table = Table.LoadTable(_dynamoDbClient, "OrdersTable");
var orderData = new Document
{
["order_id"] = orderId,
["order_details"] = message
};
try
{
table.PutItemAsync(orderData).Wait(); // Synchronously wait for the async operation to complete
context.Logger.LogLine($"Order {orderId} has been stored successfully.");
}
catch (Exception ex)
{
context.Logger.LogLine($"Unable to store order {orderId}: {ex.Message}");
throw;
}
}
// Ensure the activity is stopped at the end of processing
Activity.Current?.Stop();
}
private static void HandleSigterm(PosixSignal signal)
{
Console.WriteLine($"Received signal: {signal}");
Activity.Current?.Stop();
}
}
}
Conclusion #
Adding graceful shutdown mechanisms, such as managing the SIGTERM signal through .NET 6+ POSIX signal capabilities, ensures that tasks like submitting OpenTelemetry data are executed before a Lambda function concludes its execution. This approach guarantees the integrity and completeness of telemetry data, facilitating monitoring and analysis for your serverless applications.